Re-using an Argument Corpus to Aid in the Curation of Social Media Collections

نویسندگان

  • Clare Llewellyn
  • Claire Grover
  • Jon Oberlander
  • Ewan Klein
چکیده

This work investigates how automated methods can be used to classify social media text into argumentation types. In particular it is shown how supervised machine learning was used to annotate a Twitter dataset (London Riots) with argumentation classes. An investigation of issues arising from a natural inconsistency within social media data found that machine learning algorithms tend to overfit to the data because Twitter contains a lot of repetition in the form of retweets. It is also noted that when learning argumentation classes we must be aware that the classes will most likely be of very different sizes and this must be kept in mind when analysing the results. Encouraging results were found in adapting a model from one domain of Twitter data (London Riots) to another (OR2012). When adapting a model to another dataset the most useful feature was punctuation. It is probable that the nature of punctuation in Twitter language, the very specific use in links, indicates argumentation class.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Creating Stories from Socially Curated Microblog Messages

SUMMARY Social media such as microblogs have become so pervasive such that it is now possible to use them as sensors for real-world events and memes. While much recent research has focused on developing automatic methods for filtering and summarizing these data streams, we explore a different trend called social curation. In contrast to automatic methods, social curation is characterized as a h...

متن کامل

Curate Me! Exploring online identity through social curation in networked learning

Networked learning theory and the related literature express the importance of access to resources or content, but there is no singular way of discussing these information management processes. On the web, the rise in information abundance has seen the terms curation, digital curation, content curation, and social curation gain in popularity to describe how individual users manage their informa...

متن کامل

Migration Performance for Legacy Data Access

We present performance data relating to the use of migration in a system we are creating to provide web access to heterogeneous document collections in legacy formats. Our goal is to enable sustained access to collections such as these when faced with increasing obsolescence of the necessary supporting applications and operating systems. Our system allows searching and browsing of the original ...

متن کامل

Creating Stories: Social Curation of Twitter Messages

Social media has become ubiquitous. Tweets and other user-generated content have become so abundant that better tools for information organization are needed in order to fully exploit their potential richness. ”Social curation” has recently emerged as a promising new framework for organizing and adding value to social media, complementing the traditional methods of algorithmic search and aggreg...

متن کامل

Using large clinical corpora for query expansion in text-based cohort identification

In light of the heightened problems of polysemy, synonymy, and hyponymy in clinical text, we hypothesize that patient cohort identification can be improved by using a large, in-domain clinical corpus for query expansion. We evaluate the utility of four auxiliary collections for the Text REtrieval Conference task of IR-based cohort retrieval, considering the effects of collection size, the inher...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014